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A  state  of  the  art,  fully  functional,  multi-scale  and  multi-cdicmnel 
segmentation  tool  has  been  developed.  It  is  based  on  the  recently  developed 
computational  theory  of  the  2-nonnal  segmentations.  A  fast  multi-scale 
pyramidal  algorithm  has  been  designed  and  applied  to  the  theoretical  varia¬ 
tional  segmentation  model  of  Mumford-Shah.  This  algorithm  has  a  multi-channel 
capability,  as  veil  as  a  much  more  general  class  of  solutions.  Namely,  a 
piecevrise  polynomial  segmentation  is  natural  to  the  pyramidal  multi-channel 
framevoric.  The  piecewise  affine  segmentation  has  been  Implemented  and  tested. 
Application  specific  channels  include:  gray  scale  information,  two-dimensional 
wavelet  channels  for  texture  discrimination,  and  multi-scale  singular  feature 
channels.  The  accuracy  of  the  pyramidal  segmentation  algorithm  has  been 
experimentally  compeured  to  the  accuracy  of  two  other  modem  segmentation 
algorithms.  Ihe  performance  of  the  pyramidal  algorithm  has  shown  an  average 
four-fold  reduction  in  error  measurenents. 

Computational  experiments  with  reconnaissance  photography,  multi-sensor 
satellite  imagery,  medical  CT  and  MRI  multi-band  data  have  shown  a  great  practi¬ 
cal  potential  of  this  novel  technique.  Preliminary  experimentation  in  clutter 
removal  via  multi-channel  segmentation  points  to  a  totally  new  class  of  feature¬ 
preserving  decluttering  algorithms. 
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1  Introduction. 

Image  processes  today  are  often  presented  with  multiple  data  of  the  same  scene  obtained 
using  various  sensors.  Typical  examples  include  satellite  pictures  using  infrared,  optical  and 
other  multispectral  data  or  medical  data  such  as  MR  images  taken  under  different  protocols 
(proton,  T2,  ...)  which  give  complementary  physiological  and  anatomical  information  in 
brain  images.  It  thus  seems  quite  natural  to  include  this  information  in  image  processing 
algorithms.  This  not  only  should  improve  analysis  of  the  data  but  should  also  help  to  get 
more  stable  algorithms  as  there  is  more  information  available. 

Segmentation  algorithms  are  basic  tools  in  the  extraction  of  global  features  out  of  digitized 
data.  This  report  presents  the  first  fully  functional  multi-channel  segmentation  tool  based  on 
a  fast  multiscale  and  pyramidal  algorithm.  The  input  for  the  different  channels  is  obtained 
by  using  data  from  different  sensors  and  by  preprocessing  some  of  these  channels  with  the 
two  dimensional  wavelet  transform  combined  with  a  texture  dicrimination  algorithm  and  a 
multiscale  edge  detector. 

This  report  presents  the  mathematical  background  of  Cognitech ’s  segmentation  algorithm. 
The  algorithm  itself  is  discussed  in  section  3.  Computational  examples  are  presented  and 
discussed  in  sections  4  and  5. 
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2  Computational  theory  of  segmentation  devices 

General  principles  upon  which  the  method  used  at  Cognitech  is  based  are  stated  and  a 
variational  formulation  of  the  segmentation  problem  will  be  given.  A  sketch  of  the  recently 
developed  computational  theory  of  the  so-called  “2-normal”  segmentations  will  be  given. 
More  (theoretical)  details  are  discussed  in  [MorS,  KoeMS,  Koe]. 

1.  Cognitech  has  first  of  all  designed  a  universal  boundary  detection  device,  definable  and 
analyzable  independently  from  the  number  and  kind  of  channels  (grey  level,  colour, 
texture,  features,  etc. . . )  to  be  used  as  input  for  the  segmentation  ([BecSI,  MaliP2,  Ju]). 
(By  “boundary  in  an  image”,  the  boundary  in  the  topological  sense  is  meant:  it  is  the 
boundary  between  regions  of  a  partition  of  the  picture.  Thus  boundaries  are  different 
from  edges  obtained  by  some  local  filtering.) 

2.  A  second  principle,  adopted  in  the  following,  is  that  an  algorithm  for  segmentation 
should  be  multiscale.  Indeed  boundaries  may  appear  on  some  scale  and  not  be  relevant 
on  another.  It  follows  that  a  sequence  of  scale  parameters  A,  defines  a  corresponding 
sequence  of  segmentations  and  each  segmentation  captures  objects  of  a  specific  geomet¬ 
rical  scale. 

3.  Finally  a  comparison  principle  is  adopted.  It  states  that  given  two  different  segmen¬ 
tations  of  a  datum,  it  is  always  possible  to  decide  which  one  of  them  is  considered  as 
better  than  (or  equivalent  to)  the  other.  Thus  the  existence  of  some  total  ordering  over 
all  possible  segmentations  is  assumed,  and  this  can  be  simply  achieved  if  this  ordering  is 
reflected  by  some  real  functional  E.  Namely  if  E{K\)  <  E{A’2),  then  the  segmentation 
K\  has  to  be  considered  “better”  than  the  segmentation  K2-  For  instance,  this  principle 
is  satisfied  by  segmentation  devices  based  on  Gibbs  energy  functional  as  in  [GemG].  It  is 
not  satisfied  by  the  region  growing  methods  based  on  thresholds  [Pavl,  Pav2,  PavL,  Zu], 
nor  by  the  edge  local  detection  devices  [Marr,  MaliPlj. 

One  defines  an  image  5  as  a  scalar  or  vector  function,  defined  on  the  image  domain  fl 
(generally  a  rectangle).  One  seeks  a  segmentation,  that  is,  a  partition  of  this  rectangle  into  a 
finite  set  of  regions,  each  of  which  corresponds  to  a  part  of  the  image  where  3  is  as  regular  as 
possible.  Moreover,  one  wishes  to  compute  explicitly  the  region  boundaries  and,  of  course,  to 
control  their  regularity  and  location.  Since  by  the  comparison  principle,  all  of  these  criteria 
must  be  taken  into  account  in  the  functional  E,  this  functional  necessarily  contains  terms 
which  control  : 

-  the  similarity  of  each  region  with  respect  to  the  chosen  channels. 

-  the  size,  location  and  regularity  of  the  boundaries. 

The  model  is  a  special  case  of  the  Mumford  and  Shah  (MS)  variational  model  (see[MumSl, 
MumS2]).  According  to  this  model  the  segmentation  (u,  K)  should  be  obtained  by  minimizing 
the  functional 

Eiu,K)=  /  (u-gfclcdy  +  Xe{K), 

Jq\k 


(1) 
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where  A"  is  a  union  of  boundaries  in  Q  with  HausdorfF  length  £(K),  and  u  is  piecewise  constant 
on  n  \  A.  The  constant  A  represents  the  scale  parameter  of  the  functional  and  measures  the 
length  of  the  boundary  :  if  A  is  small,  many  boundaries  are  allowed  and  one  gets  a  fine 
segmentation.  As  A  increases,  the  segmentation  gets  coarser  and  coarser. 

Mumford  &  Shah  [MumSl]  and  Morel  &  Solimini  [MorS]  have  proved  that  either  the 
boundary  K  is  regular  (at  least  or  it  has  singular  points  of  two  types,  namely,  triple  points 

where  three  branches  meet  at  120°  angles  or  boundary  points  where  K  meets  the  boundary  dSl 
of  fl  at  a  90°  angle.  Similar  functionals  have  also  been  studied  by  [DeG,  AmbDeG,  CarDeGL] 
in  order  to  model  physical  phenomena  of  phase  transitions  [FoT]  and  liquid  crystals.  We  follow 
a  computational  approach  of  [MorS]  for  the  reduced  formulation  (1).  Presently  the  full  (MS) 
model  has  no  constructive  solution  in  multi-dimensions. 

Functional  (1)  represents  a  compromise  between  accuracy  of  the  regions  and  parsimony 
of  the  boundaries.  In  case  of  grey  level  segmentation  a  pure  region  growing  would  simply  put 
together  the  pixels  with  similar  grey  levels  [Zu,  HarSj.  The  above  procedure  generates  very 
nonsmooth  boundaries,  too  “small”  or  too  “thin”  regions.  Thus,  if  no  control  is  made  on  the 
boundaries,  one  needs  additional  criteria  and  thresholding  to  achieve  a  decent  segmentation. 
A  functional  such  as  (1)  is  designed  to  avoid  this  kind  of  ad-hoc  treatment.  One  hopes 
to  have  all  the  criteria  assembled  in  the  same  functional.  Many  early  functionals  in  image 
analysis  had  only  two  dimensional  energy  terms,  coupled  with  some  thresholding  criteria  (see 
[Pavl,  Zuj)  and  no  global  approximation  and  convergence  results  were  available  until  now. 

It  is  well  known  that  functionals  such  as  (1)  may  have  many  local  minimizers.  A  search 
for  global  extrema  leads  to  an  NP-complete  problem. 

One  has  thus  to  choose  between  two  strategies  : 

•  The  global  minimization  is  performed  by  simulated  annealing  methods,  which  leads  to 
huge  computational  complexity,  but  ensures  that  the  global  minimum  is  attained  [GemGj. 

•  Alternatively,  one  can  define  a  concept  of  a  “good”  local  minimum  which  should  be 
more  accessible  to  fast  computations. 

Recently,  even  users  of  simulated  annealing  have  developed  a  tendency  to  define  a  fast  ap¬ 
proximation,  which  need  not  find  a  global  minimum  [Azj.  The  homotopy  method  of  Blake 
and  Zisserman  [BlakZ]  also  seeks  a  “good”  local  minima. 

The  following  notions  are  basic  to  the  analysis  of  the  algorithm.  A  subsegmentation  of 
a  segmentation  A  is  a  segmentation  obtained  by  merging  an  arbitrary  number  of  adjacent 
regions  of  K. 

Definition  1  A  segmentation  K  will  be  called  2-normal  if  for  every  pair  of  regions  O,  and 
Oj,  the  new  segmentation  IC  obtained  by  merging  these  regions  satisfies  E{K')  >  E{K). 

Theorem  1  The  set  of  2-normal  affine  segmentations  has  the  following  compactness  prop¬ 
erty:  for  every  sequence  /v'„  of  such  segmentations,  there  exists  a  subsequence  converging  to 
a  segmentation  K  such  that 

A(A)<liminf  A(A„). 

n 

A'  is  not  necessarily  2-normal,  bui  it  has  a  2-normal  subsegmentation  with  lower  energy. 
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Compactness  is  a  weak  form  of  uniqueness  :  it  indicates  that  the  set  of  2-normal  segmen¬ 
tations  is  much  smaller  than  the  set  of  all  possible  segmentations  (which  is  not  compact). 

It  also  provides  a  very  important  computational  property  :  If  one  defines  an  algorithm 
computing  better  and  better  solutions,  that  is  solutions  with  decreasing  energy,  these  solutions 
must  accumulate  around  “good”  local  minima  of  the  functional  (1). 


3  A  pyramidal  algorithm  constructing  2- normal  affine  seg¬ 
mentations. 

Consider  now  the  problem  of  defining  and  computing  a  sequence  of  recursive  mergings  re¬ 
sulting  in  a  good  2-normal  segmentation.  Notice  that  not  all  2-normal  segmentations  are 
interesting;  for  instance,  the  empty  segmentation,  where  ft  is  the  single  region  is  clearly  a 
2-normal  segmentation. 

Assume  that  the  datum  g  is  defined  on  a  rectangle.  This  rectangle  is  divided  into  small 
squares  of  constant  size  (the  pixels)  and  g  is  assumed  to  be  constant  on  each  pixel.  The 
properties  required  for  the  segmentations  computed  by  a  region  growing  algorithm,  defined 
as  an  application  associating  to  g  and  A  a  segmentation  (u,  K)  are: 

a)  “Correctness”  (Fixed  point  property)  :  Assume  that  g  is  piecewise  constant  on  some  areas 

of  the  rectangle.  Then  there  exists  a  value  Aq  of  the  parameter  A  such  that  for  every 
A  <  Aq,  the  segmentation  (u,  K)  obtained  by  the  algorithm  satisfies  u  =  g  and  K  is  the 
union  of  the  boundaries  of  the  areas  where  g  is  constant. 

b)  “Causality”  (Pyramidal  segmentation  property) :  If  A  >  A',  then  the  boundaries  provided 

by  the  algorithm  for  A  are  contained  in  those  obtained  for  A'  and  the  areas  of  the 
segmentation  associated  with  A  are  the  unions  of  some  of  the  areas  obtained  for  A'. 

The  last  property  ensures  that  a  fast  pyramidal  algorithm  can  be  implemented,  by  computing 
a  hierarchy  of  segmentations  from  fine  to  coarse  scales.  Moreover  the  coarser  segmentation 
will  be  deduced  from  the  finer  by  “merging”  operations,  with  a  pyramidal  structure  for  the 
computation.  Note,  that  as  a  consequence  of  the  fixed  point  property,  if  A  is  very  small,  the 
computed  segmentation  is  attained  with  {uq^Kq)  where  uo  =  u  and  Kq  consists  of  all  the 
boundaries  of  all  the  pixels,  and  therefore  coincides  with  the  global  minimum  for  A  equals 
zero.  We  shall  call  this  segmentation,  where  each  pixel  is  a  region,  the  “trivial  segmentation”. 
The  recursive  merging  algorithm  used  at  Cognitech  satisfies  all  the  above  properties. 

3.1  Principles  of  the  segmentation  algorithm 

The  algorithm  is  based  on  the  classical  technique  of  region  growing,  or  merging.  Starting  with 
an  initial  partition  (for  example  pixel  by  pixel,  which  wiU  be  called  the  trivial  segmentation) 
one  merges  those  regions  which  satisfy  an  homogeneity  criterion.  One  hopes  to  get  in  this 
manner  homogeneous  regions  with  respect  to  the  criterion. 
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The  algorithm  starts  with  a  partition  of  the  images  into  square  regions  which  can  be  of 
1x1,  2x2,  4x4  pixels.  At  this  stage,  all  boundaries  between  regions  are  therefore  horizontal 
or  vertical.  Presently  the  final  segmentation  consists  of  patches  of  pixels.  The  boundary  K 
falls  in  between  image  pixels 

The  initial  segmentation  has  a  certain  number  of  channels  which  are  computed  in  each 
square  region.  A  channel  contains  some  information  which  is  associated  with  each  region,  such 
as  its  area,  its  mean  grey  level,  its  mean  gradient,  or  (for  texture  analysis)  its  mean  oscillation 
at  a  given  scale.  Boundaries  and  vertices  of  the  segmentation  can  also  be  associated  with 
several  channels.  For  a  boundary,  the  channel  can  be  the  length,  a  measure  of  the  contrast 
(i.e.  jump  across  the  boundary),  . . .  For  a  vertex,  the  channel  can  be  a  contrast  measure, 
which  e.g.  gives  a  measure  of  whether  the  point  can  be  a  crossing  point  or  not.  All  of  the 
channel  information  will  contribute  to  give  a  measure  of  the  homogeneity  of  regions. 

An  important  thing  to  keep  in  mind  is  that  once  channels  have  been  associated  with  a 
region,  the  single  pixels  of  the  region  are  “forgotten”  and  only  the  value  of  the  channels  in 
the  region  will  influence  the  merging  process.  This  explains  the  strength  and  speed  of  the 
algorithm,  since  there  are  many  fewer  channels  than  pixels.  Moreover,  the  only  operations 
to  be  made  on  the  channels  when  merging  are  additions.  Also,  this  is  true  for  all  kinds 
of  channels  involved  in  piecewise  polynomial  or  piecewise  textural  segmentation.  Thus  the 
algorithm  is  universal  and  independent  of  the  meaning  and  number  of  the  channels. 

3.2  Algorithmic  model  and  data  structure  design. 

The  representation  of  a  partition  of  a  picture  is  based  on  the  following  structures: 

•  Region  :  described  by  its  channels  (a  list  of  real  numbers),  and  its  borders  with  adjacent 
regions. 

•  Border  :  described  by  its  own  channels  and  the  two  adjacent  regions,  and  by  a  list  of 
its  connected  components. 

•  Connected  component  of  border  :  given  by  its  two  vertices  and  by  a  chain  of  pixels 
joining  them. 

•  Vertex  :  described  by  the  list  of  its  adjacent  borders,  its  coordinates  and  its  channels. 

Every  object  is  defined  by  some  proper  information  (channels,  coordinates)  and  its  relations 
to  other  objects.  For  example  fi,  in  figure  1(a),  is  a  picture  made  of  three  regions  Qi,  Q2  and 
Q3. 

Let  us  recall  that  dili  denotes  the  boundary  of  a  region  ili  and  d(ili,Q2)  is  the  common 
boundary  of  and  Q2-  We  assume  that  the  external  boundary  of  (i.e.  dft)  is  never  part 
of  the  boundaries  of  the  regions  which  partition  the  image. 

Denote  by  Ci,i  =  1,..  .,4  connected  boundary  elements,  then  d^li  =  (ci  U  C4)  U  C2,  dil2  = 
(ci  U  c^)  U  C3  and  dils  =  Ci  U  C4. 

With  the  above  conventions  the  region  Dj  has  two  borders  d{Cli,il2)  and  5(Di,n3),  with 
d{ili,^l2)  having  two  connected  components  (e.g.  ci,  C4)  and  0(fii,fl3)  just  one  (e.g.  C2). 
Associated  to  these  are  vertices,  to  C2  vertices  S2  and  S3.  Vertex  Si  has  one  adjacent  border 
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whereas  vertex  52  has  two  adjacent  borders  5(ni,fi2) 


an  an 


Figure  1:  Example  of  regions  before  (a)  and  after  (b)  merging. 

The  result  of  merging  regions  fli  and  Q2  >s  represented  in  figure  1(b).  To  achieve  this 
one  has  to  erase  5(fli,fl2),  which  means  the  two  connected  components  Ci  and  C4  and  the 
vertices  Si ,  S2,  S3,  S4.  The  frontier  of  the  new  region  ft(i2)  with  fla  wiU  have  a  single  connected 
component  obtained  by  merging  C2  and  C3. 

The  implementation  in  the  programming  language  C  uses  structured  data  types  with 
dynamical  memory  allocation.  A  region  will  be  a  variable  of  strucured  type  REGION,  made 
of  an  array  of  real  numbers  and  pointers  to  a  list  of  variables  of  type  called  BORD,  which 
represents  the  boundary  between  two  regions. 

3.3  Description  of  the  algorithm. 

Initialization.  After  calculating  the  channels  and,  as  stated  above,  taking  an  initial  seg¬ 
mentation  made  of  a  single  pixel,  the  entire  data  structure  is  set  up.  The  regions  are  put 
into  a  linked  list.  The  present  choice  has  been  to  create  a  list  of  all  initial  regions  as  fol¬ 
lows:  left-right  and  up-down.  This  structure,  convenient  from  a  programming  point  of  view, 
does  introduce  some  bias.  For  a  multiscale  segmentation  the  series  Ai,  A2, . . . ,  Af,  handles  the 
growth  of  the  scale  parameter. 

The  criterion.  The  decision  to  proceed  to  a  merging  of  two  regions  O,  and  Oj  depends  on 
the  sign  of  E{u,  K  \  d{Oi,  Oj))  —  E{u,  K).  The  algorithm  looks  for  a  decrease  of  global  energy 
by  merging  these  regions.  This  is  the  criterion  of  2-normal  segmentations  introduced  in 
section  2.  The  simplified  Mumford  and  Shah  model  is  implemented  by  choosing  the  following 
energy  functional  : 

E(u,K)=  /  ||«-«,||2  -H  Af(A'). 

Ju 

Here  5  is  a  vector  valued  function,  whose  components  are  different  channels,  defined  on  the 
rectangle  fi,  u  is  the  approximating  vector  function,  and  K  is  the  set  of  boundaries  with  total 
length  £{K).  As  in  the  piecewise  constant  case  u  is  the  mean  value  of  g  on  each  region  O. 
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The  above  functional  is  just  denoted  by  E(K),  i.e.  to  obtain  (u,  K)  one  needs  to  know  only 
K.  Then  the  fusion  criterion  is 

E{K  \d(Oi,Oj))  -  E(K)  =  •  ll«.-  -  “ill"  -  X-i{d{Oi,Oj)), 

where  |.|  is  the  area  measure  and  u,  the  approximation  of  g  on  Oi.  When  g  is  scalar  the  norm 
||.||  is  just  the  absolute  value.  For  multichannel  data  a  weighted  norm  ||.||  is  used.  It  is  specific 
to  each  application  and  to  the  meaning  of  the  different  channels.  This  will  be  emphasized  in 
the  next  section. 

To  get  the  necessary  data  for  evaluating  the  criterion  the  following  information  has  to  be  used: 
Suppose  g  =  ‘(5*, . .  then  every  region  O  has  a  channel  with  its  area  \0\  and  n  channels 

Cq  =  /  (/  =  1, . . . ,  n).  These  yield  the  values  for  u  restricted  to  O  :  uq  =  ‘(uq,  . . . ,  Uq) 

Jo 

by  simply  computing  Uq  =  to  get  the  mean  value. 

The  channels  of  region  Onew  obtained  by  merging  O,  and  Oj  are  given  by  |Onew|  =  |0«|  +  \Oj\ 

+COj  ,  (/  = 

Thus  a  merging  of  two  regions  just  implies  adding  the  corresponding  channels  and  updating 
the  data  structure  (i.e.  pointer/address  affections). 

It  is  straightforward  to  use  the  same  model  for  higher  order  polynomials  just  by  changing 
the  criterion.  To  get  a  fast  algorithm  one  just  ensures  the  same,  additive,  property  of  the 
channels  (see  next  section). 

The  algorithm. 

(i)  Take  the  segmentation  {uq,Ko)  resulting  from  the  initialization  and  A,  =  Ai  as  a  scale 

parameter. 

(ii)  Scan  the  list  of  regions  and  for  every  candidate  region  look  for  the  adjacent  region  which 
yields  the  besc  merging  score  (i.e.  the  maximal  energy  decrease).  If  such  a  region  exists 
proceed  to  merge  and  update  the  data.  The  next  region  in  the  list  becomes  a  candidate 
for  merging. 

For  X  =  Xi  fixed,  repeat  the  scanning  of  the  picture  until  there  is  no  merging  possible. 
After  this  step,  a  2-normal  subsegmentation  (u,,  Ki)  of  the  initial  segmentation  for  scale 
parameter  A^  is  achieved. 

(in)  For  every  A,,  i  —  calculate  a  2-normal  segmentation  by  iterating  step  (ii). 

The  algorithm  stops  if  there  is  just  one  region  left  or  after  computing  the  last  scale 
parameter  A^,  a  2-normal  segmentation  («/.,  A’i). 

Result.  The  intermediate  segmentations  obtained  at  each  scale  A,  is  saved  before  sweeping 
to  A.+j.  Each  ouput  yields  the  following  objects  :  connected  boundaries  K,  the  approximation 
u,  perimeter  and  area  of  each  region  O.  Each  of  the  above  computed  objects  represents 
an  important  feature  oriented  information  about  the  analyzed  picture.  See  the  numerical 
experiments  below  to  illustrate  the  applications  of  these  objects. 
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Remark.  This  way  of  sweeping  a  list  of  regions  is  not  optimal  because  the  best  merging 
is  done  by  searching  over  the  whole  list.  This  slight  bias  is  however  compensated  by  the 
multiscale  structure  of  the  merging  process.  The  algorithm  begins  with  a  small  value  of  A 
and  increases  it  when  all  possible  mergings  have  been  done.  This  multiscale  loop  orders  the 
mergings  by  order  of  priority  with  respect  to  the  functional.  A  time  expensive  track  keeping 
of  a  chart  with  the  best  merging  out  of  all  adjacent  regions  would  be  prohibitively  expensive. 
The  compactness  property  assures  a  “good”  local  minima  solution. 


3.4  The  use  of  multiple-channels. 

The  piecewise  constant  channel  case  has  been  explained  above.  Similar  local  average  channels 
can  be  used  to  get  a  piecewise  affine  segmentation  of  the  original  data.  Recall  that  input 

data  consists  of  region  area  channels  {|Oi|}  and  channels  |cq^  =  J 


Piecewise  affine  segmentation. 


One  keeps  the  length  term  constant  and  considers  how  the  integral  term  changes  in  this  case. 
Let  u(i,  y)  =  ax  +  by  +  c  he  the  function  which  approximates  g  on  region  O  in  a  sense  of  the 
least  square  mean  error 


y)-ax  -by  -  c]'^dxdy  . 


(2) 


Accordingly  the  coefficients  a,  b  and  c  (depending  on  0)  are  solution  of  the  following  normal 
linear  system: 


^  f  x^  f  xy  f  X 

f^y  fy^  fy 

\  fy  /I 


a  \  I  f9x\ 

<>  =  /</y 

^  /  \  f9  ) 


The  information  needed  to  compute  a,  b  and  c  is  contained  in  the  following  10  integral  terms: 

fo  1  (=  !o  So  y^  So  So  ^y^  So  y^^  fo  9>  Jo  9^,  Jo  yy^  fo  9^- 


Note  that  the  pyramidal  nature  of  the  merging  is  still  preserved  here.  Simply  declare  10 
channels  for  each  region  and  use  these  channels  to  represent  the  necessary  data  for  piecewise 
affine  segmentation.  It  is  not  necessary  to  compute  (2)  on  a  new  region  obtained  by  a 
merging.  The  moments  listed  above  will  gi-e  the  necessary  information  without  “going 
back”  to  the  original  picture,  hence  the  pyramidal  structure.  The  coefficients  a,  b  and  c  can 
be  computed  if  one  needs  the  reconstruction  «,  but,  for  obtaining  getting  the  partition  of  the 
picture,  coefficients  a,  b,  c  are  not  even  needed.  In  figure  2  the  result  of  piecewise  constant 
segmentation  is  compared  to  a  piecewise  affine  segmentation  on  a  synthetic  original  picture 
made  of  affine  elements.  The  original  piecewise  affine  image  (i)  yields  a  piecewise  constant 
segmentation,  represented  by  its  boundaries  (ii)  and  the  approximation  (iii).  A  piecewise 
affine  segmentation  gives  more  accurate  partition  (iv)  and  approximation  (v). 

The  first  line  represents  the  original  picture,  and  the  second  line  shows  the  boundaries 
K  and  the  reconstruction  u  using  a  piecewise  constant  model.  The  third  line  shows  both  K 
and  u  for  a  piecewise  affine  model. 

Improvements  of  this  program  and  for  higher  degree  polynomials  may  be  developed  using 
ENO-subcell  methods  which  are  successfully  used  for  image  compression  at  Cognitech. 
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Applying  a  wavelet  transform. 

Differential  or  semilocal  integrodifferential  operators  provide  useful  analysis  of  pictoral  in¬ 
formation.  The  2-D  wavelet  transform  is  used  here  for  such  analysis.  It  is  based  on  the 
pyramidal  algorithm  of  Stephane  Mallat  [Mall]  using  the  multiresolution  approximation  (see 
[Mey])  formulation.  The  bidimensional  decomposition  uses  two  monodimensional  decompo¬ 
sitions  (  following  the  X ,  Y  axis). 

Suppose  Vj  is  a  multiresolution  of  Z/^(IR),  with  associated  orthonormal  basis  and  Wj 

the  orthogonal  complement  of  Vj  in  Vj+i,  with  orthonormal  basis  Then  Vj  =  Vj®Vj 

is  a  multiresolution  analysis  of  Z,^(1R^).  Let  us  denote  by  Wj  the  orthogonal  complement  of 
Vin  V.+i, 

•  {4P^{x)4>{^{y)) ^  ^  is  an  orthonormal  basis  of  Vj  . 

•  is  an  orthonormal  basis  of  Wj. 

The  approximation  of  picture  g  at  resolution  2-'  is 

Sj  =  (Sj{n,Tn))n.m  ,  with  Sj(n,m)  :=  2-’  <  >  ■ 

The  difference  of  information  from  Sj+i  to  Sj  is  given  by  three  “details” 

D]  =  {D]{n,m))n.m  ,  with  D]{n,m)  :=  2^  <  g,4>iipin  >  , 

D]  =  {D]{n,  m))n,rn  ,  with  D'j{n,Tn)  :=  2-’  <  g,  >  , 

=  (Djin,m))n,m  ,  with  D^(n,m)  :=  2-’  <  >  • 

The  Sj,  Dj,  Dj  and  Dj  can  be  deduced  from  5j+i  as  follows 

Sj{n,m)  =  Sj+i(k,l)h(k  -  2n)h(l  -  2m) 
kJeTl, 

Dj(n,m)  =  E  Sj+\{k,l)h{k  -  2n)f{l  -  2m) 
k.iea 

Dj{n,m)  =  Sj+i{k,l)f{k  —  2n)h(l  ~  2m) 

k,le2l 

Dj{n,m)  =  Sj+i{k,l)f{k-2n)f{l~2m) 

fc.iez 

The  functions  /  and  h  are  quadratic  mirror  filters,  related  to  Fourier  transforms  of  (j)  and  0 
and  satisfy  /(n)  =  (-l)^’~"^/i(l  -  n). 

The  -Y,  Y  separation  of  the  decomposition  in  the  2-D  case  implies  that  J5]  represents 
the  loss  of  information  in  the  Y  direction  by  passing  from  5j+i  to  Sj,  Dj  corresponds  to 
the  horizontal  information  loss  and  Dj  is  the  crossed  information  loss  at  scale  2^.  Figure 
3  shows  the  classical  representation  of  a  wavelet  decomposition,  using  Daubechies’  compact 
supported  wavelets,  computed  at  level  j  =  -1,-2. 

The  filters  used  are  typically  the  Haar  basis  or  Daubechies’  basis  which  have  a  finite 
number  of  non-zero  cefficients. 

Texture  channel  computation. 

Denote  by  F,  the  output  obtained  by  wavelet  filter,  nonlinear  rectification  by  taking  negative 
and  positive  part  of  F,  yields  two  channels  R2i  =  and  R2,+i  =  F~ .  This  step  is  required 
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Figure  ‘i-D  wavelet  Transform. 


for  texture  segmentation.  I  Ik'  particularity  of  the  presented  algorithm  is  that  all  the  infor¬ 
mation  eom|)uted  from  the  initial  picture  is  kept  and  not  just  the  most  imi)ortant  (oscillant) 
channel  for  each  region. 

Indeed  the  preceding  steps,  whose  aim  it  is  to  obtain  texture  discriminating  information, 
are  under  different  guises  cpiite  widely  spread  in  text ure  segmentation  algorithms.  However 
most  of  these  algorithms  are  unable  to  obtain  a  segmentation  from  mnlti-cliannel  input, 
so  they  have  to  restrict  themselves  to  one  information  channel.  Moreover  they  choo.se  the 
maximum  res|)onding  channel  to  characterize  a  region.  It  seems  (piite  obvious  that  two 
adjacetit  regions  can  have  the  same  “maximum"  channel  without  having  comparable  responses 
in  other  channels  so  that  they  are  di.scriminable  but  “maximnm-r('si)onse-algorithms"  will  not 
notice  the  different  behaviour  in  non  extremal  channels. 

Figure’  1  shows  an  example’  of  te’Xture’ eliscrimiiiation  using  only  information  provieleel  by 
waveh’t-channe’ls.  I’lie  in|)Ut  is  a  pair  e)f  te’xtures  (“Bre^elatz-textnres")  anei  the  output  shows 
the  bounelary  between  the  texture’s  eliscriminate'el  by  the  wavelet  anel  the  scales  which  have 
be’en  useel  (  :i  scales,  llaar  basis). 

Multiscale  edge  detection. 

do  analyse  the  effe’ct  of  local  features,  via  the  ai)|)licat ion  of  multiscale  singularity  detectors 
(see  [Ruel]).  aelelitional  channels  are  cre’ateel.  This  cla.ss  of  channels  provides  information 
which  is  application  specific.  For  example  the  user  may  search  for  regions  whose  edges  satisfy 
s|)ecial  geoim’t  ric  i)roperties.  such  as  linearity  or  circularity  on  a  part  icular  scale.  'Fhe  feature 
channel  can  tnk(’  a  bit  maj)  form  as  presc’iited  in  figure.')  (see  also  figure  12(ii)  for  the  original 
data). 
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Figure  4:  Brodatz-texture  discrimination  using  Haar  basis  and  3  scales. 


Figure  5:  Example  of  edge  map. 
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3.5  Details  on  implementation. 

The  program  is  written  in  C,  using  dynamic  representation  for  the  data.  The  file  StructSeg.h 
contains  the  definitions  of  all  data  structures,  InitSeg.h  and  CanalSeg.h  contain  all  proce¬ 
dures  necessary  for  the  initialization  of  the  topological  model  on  the  initial  grid,  as  well  as 
the  computation  of  the  channels.  MergeSeg.h  contains  aU  functions  updating  the  topological 
structure  of  regions,  borders  and  vertices  when  a  merging  is  activated. 

The  output  of  the  obtained  segmentation  is  made  by  functions  specific  to  an  application. 
The  program  uses  MEGAWAVE^  and  IMAGE^  libraries  for  image  file  handling. 

Size  of  the  code  and  of  the  used  RAM  during  the  run  :  around  60  kB  (2000  lines).  The 
choice  being  to  have  a  fast  procedure,  but  only  if  a  lot  of  RAM  is  avalaible,  several  error 
messages  are  included  in  the  files  to  warn  against  a  lack  of  available  RAM.  Furthermore,  a 
procedure  called  Estimation  gives  the  estimated  memory  size  necessary  at  the  beginning  of 
execution. 

The  data  structure.  (Confer  file  StructSeg.h) 

In  order  to  represent  computationally  the  topological  relations  of  regions,  borders  and  ver¬ 
tices,  structured  variables  available  in  the  C  language  are  used.  The  data  structure  has  been 
partially  represented  in  figure  6.  One  sees  that  this  data  structure  is  absolutely  faithful  to 
the  topological  relations. 

Since  the  number  of  borders  corresponding  to  a  given  region  or  a  given  vertex  changes 
during  the  region  growing,  all  border  handling  is  made  by  the  use  of  a  linked  list.  Thus  the 
datatype  Li_boundaries  ( list  of  borders)  is  introduced.  The  type  Li.pixels  (List  of  pixels) 
allows  to  define  a  path  from  a  vertex  to  another. 

A  remark  on  the  orientation  of  borders. 

Each  border  is  given  by  its  adjacent  regions,  the  right  region  rd  and  the  left  region  rg.  It  is 
thereafter  decomposed  into  its  connected  components.  Each  one  of  them  is  oriented  in  such 
a  way  that  the  left  region  rg  be  always  left  and  the  right  region  rd  always  right  of  the  path 
when  the  path  is  scanned  from  vertex  sa  towards  vertex  sb. 

Let  us  however  recall  that  borders  and  vertices  are  in  fact  virtually  defined  between  the 
pixels  of  the  image.  Thus  no  pixel  is  associated  with  a  boundary  or  a  vertex.  In  order  to 
print  or  display  a  border,  one  has  to  make  a  choice  for  the  pixels  which  will  represent  it.  One 
allows  therefore  a  1/2  pixel  error,  since  boundary  and  vertex  are  put  into  the  right  below 
adjacent  region.  Another  possibility  is  to  create  an  edge  map  of  size  (2dx  -  l,2dy-  1),  where 
the  “border”  pixels  are  plotted  in  between  the  “image”  pixels  (both  are  implemented). 

Model  initialization. 

It  has  two  different  parts.  The  first  deals  with  the  construction  of  all  initial  regions,  borders, 
connected  components  of  borders,  and  vertices,  and  with  their  connections.  This  initialization 
is  fully  independent  of  the  merging  criteria.  The  second  part  of  the  initialization  is  to  compute 
all  channels  for  initial  regions,  borders  and  vertices.  This  part  is  easy  to  change  and  guarantees 
are  large  variety  of  applications. 


of  CEREMADE,  University  of  Paris-Dauphine 
of  Cognitech,  Inc.  Santa  Monica 
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( Prec.  region) 


Figure  6:  Representation  of  the  data  structure. 


Requirements  about  the  RAM  and  the  computing  time  necessary  for  a  fast  execution. 
Assuming  an  initial  grid  of  dimensions  {gx,gy),  there  are  gx  -gy  initial  regions,  gx{gy—  1)  + 
gy{gx  -  1)  borders  and  {gx  4-  1)(5J/  +  1)  —  4  vertices.  In  general  s  =  gx  =  gy  =  128,  256 
or  512,  therefore  a  square  image  with  approximately  regions,  2s*  borders  and  (s  +  1)* 
vertices.  Denote  by  NCR,  NCB  and  NCS  the  number  of  channels  by  region,  border  and 
vertex  respectively.  This  is  the  memory  involved  for  the  different  structures  : 


Regions 


Borders 


Connected  components 


Vertices 


List  of  borders 


List  of  pixels 


(NCR  +  3)s 


8(NCB  +  3)s 


4(s+l)'‘-(NCS  +  2) 


Thus  around  200s*  +  4{NCR  +  2NCB  +  NCS)s^  bytes  for  the  initial  segmentation.  If  one 
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wants  to  process  a  512  x  512  pixels  image  with  an  initial  grid  2x2  and  48  texture  channels, 
one  obtains  400s^,  that  is,  with  s  =  256,  25  megabyte.  Estimates  on  the  computation  time 
with  48  channels  and  s  =  256  give  a  time  of  2  minutes  on  Cognitech’s  76  MIPS  HP730 
workstation.  This  is  good  for  an  experimental  device. 


4  Measuring  the  accuracy  of  pyramidal  segmentation. 

Modern  applications  of  image  processing  require  high  degree  of  accuracy  during  the  feature 
extraction  stage.  This  should  be  also  true  for  segmentation  analysis.  Instances  of  the  impor¬ 
tance  of  accurate  region  segmentation  are  found  in  such  distinct  fields,  as  the  nondestructive 
material  testing  via  imaging,  automated  imaging  quality  control  and  the  construction  of  ac¬ 
curate  physical  models  of  patient  anatomy  for  surgical  planning.  Such  accuracy  analysis  is 
notably  missing  from  the  current  computer  vision  literature  works  on  segmentation  tech¬ 
niques.  However,  some  study  was  done  by  the  medical  imaging  researchers  [MRE].  Here  we 
shall  follow  their  method  of  benchmarking  segmentation  accuracy  through  experiments  with 
the  Robertson  phantom,  as  explained  below. 


Robertson’s  phantom  on  the  figure  7  is  a  125mm  diameter  cylinder  containing  4  different 
materials  as  depicted  on  this  figure.  Eight  machined  cylindrical  steps  are  cut  through  the 
innermost  material.  Prior  to  assembly,  the  true  inner  radii  were  measured.  This  experiment 
uses  images  obtained  by  the  CT  cross  sectioning  of  the  phantom.In  all,  eight  cross  sections  im¬ 
ages  of  the  annular  phantom  were  segmented  with  a  three  different  segmentation  techniques. 
First  and  second  segmentation  algorithms  are  variants  of  the  commonly  used  contour  tracing 
technique  [Pav3].  The  accuracy  in  localization  and  radius  estimation  of  these  methods  were 
compared  to  the  accuracy  of  Cognitech’s  pyramidal  segmentation. 


The  experiment  is  illustrated  by  showing  two  of  the  cross-sections  on  the  figure  8(i),(ii) 
with  different  inner  radii.  The  8(iii),  (iv)  show  corresponding  segmentations.  Two  inner  radii 
are  measured  and  compared  to  a  known  true  radii.  The  plot  of  the  radius  versus  the  angular 
position  for  all  three  techniques  is  presented  on  the  figure  9.  Note  here  that  while  the  pyra¬ 
midal  segmentation  oscillates  around  the  true  mean  value  radius,  the  competing  algorithms 
are  consistently  above  or  below  the  true  value.  The  performance  of  the  pyramidal  algo¬ 
rithm  demonstrated  an  average  four-fold  reduction  in  error  measurement  of  the  inner  circle! 
The  bar  diagram  on  page  19  visualizes  this  comparison  of  the  error  in  inner  and  outer  radius. 


Modifications  of  the  present  pyramidal  algorithm  will  be  proposed  in  order  to  obtain 
accuracy  in  the  variance  of  the  geometrical  boundaries  of  extracted  regions. 
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Figure  7:  Scheme  of  the  phantom  used  for  accuracy  tests. 


Figure  9:  Accuracy  comparison  between  different  estimation  methods  of  the  inner  (i)  and  the 
outer  (ii)  radius. 


PERCENT  ERROR  Radius  (mm) 


Confidential  proprietary  COGNITECH,  INC. 


19 


MEASURED  RADII 


ERROR  IN  INNER  AND  OUTER  RADIUS 


PHANTOM  RING  (thfckness/radius) 
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5  Experimental  results. 

Reconnaissance  photography. 

Figure  10  depicts  multiscale  segmentation  of  the  single  frame  256*256  B&W  8-bit  “Chemical 
plant”  (i).  The  sequence  of  three  scales  progressively  remove  small  scale  features,  thus  un¬ 
covering  such  objects  of  interest  as  roads  and  buildings  in  the  figures  lO(ii)  and  lO(iii).  At 
the  end  only  the  main  storage  facility  remains  as  a  region  in  the  figure  lO(iv). 


To  illustrate  effects  of  segmentation  on  the  p-w  constant  reconstruction,  consider  figure 
ll(i)  of  the  “Desert  airplane”.  A  choice  of  the  scale  gives  the  segmentation  of  figure  ll(ii). 
Figure  ll(iii)  is  the  corresponding  p-w  constant  approximation.  Such  reconstruction  is  use¬ 
ful  if  a  shading  has  to  be  removed.  Notice  that  this  segmentation  draws  a  box  around  the 
aircraft. 

To  introduce  a  true  multichannel  segmentation,  a  multisensor  satellite  set  “Cairo”  was 
obtained  via  Los  Alamos  Nat.  Lab.  Figure  12(i)  and  12(ii)  were  collected  with  two  distinct 
spectra  devices.  A  fusion  of  a  multisensor  data  is  required  since  complementary  information 
was  captured.  Figure  12(iii)  is  the  two  channel  fusion.  If  in  addition  edge  channels  are  fused, 
a  quite  different  segmentation  figure  12(iv)  results  (the  edge  map  of  12(ii)  is  shown  in  figure 
5).  The  implication  of  this  example  is  a  new  way  to  fuse  such  distinct  channels  as  infrared, 
optical,  laser  range. 
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Figure  12:  F.xaniple  of  multiple  cliainiel  segmentation  of  the  multisensor  satellite  image  pair 
“Cairo”. 
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Medical  imaging  applications. 

Rapid,  accurate  image  segmentation  can  be  applied  to  radiological  imaging.  Cognitech’s 
algorithm  offers  a  high  degree  of  noise  and  threshold  immunity  and  allows  multi-band  data  , 
thus  making  it  ideal  for  the  tasks  of  multi-echo  MRI  sequences. 

Figure  13(i)  is  256*256  grey  scale  MRI  reconstruction  of  the  slice  of  brain  of  the  patient  with 
tumor.  The  pyramidal  segmentation  is  applied  to  determine  quantitative  brain  anatomy  at  a 
specified  scale.  The  measurements  of  brain  structures  (area  and  surface)  pointed  with  arrows 
on  the  figure  13(ii)  are  presented  .  This  analysis  yielded  an  accurate  location  and  estimation 
of  the  tumor,  shown  here  as  a  shaded  area.  In  the  future,  Cognitech,  Inc.  will  develop 
automated  neuroanatomical  tool  bzised  on  the  presented  algorithm. 

The  set  of  figure  14(i)  and  14(ii)  is  composed  of  MR  sectional  images.  They  represent  image 
pairs  taken  under  different  MR  protocols.  The  goal  of  segmentation  is  to  combine  this  two 
sensor  input  in  order  to  separate  three  anatomical  strata:  centrally  located  "butterfly”  of  the 
CSF  (ceribro-spinal  fluid)  from  the  brains  white  matter  and  the  gray  matter. 

Figures  14(iii)  and  14(iv)  show  two  independent  segmentations,  each  having  complementary 
as  well  as  conflicting  information.  On  the  figure  14(v)  a  two-channel  segmentation  gives 
anatomically  meaningful  partition  as  well  as  accurate  length/area  measurements.  These 
results  are  presented  in  Cognitech’s  [MRKMO]  SPIE  paper. 
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Towards  clutter  removal  algorithms. 

Finally,  to  give  some  feeling  for  the  future  applications  of  the  pyramidal,  multichannel  image 
analysis,  the  single  gray  scale  channel  segmentation  of  the  figure  15  (  512  x  512  “Tank”)  is 
compared  to  the  segmentation  based  on  18  wavelet  channels.  A  grey  level  segmentation  and 
reconstruction  was  computed  on  figure  16(i)  and  16(ii)  correspondingly.  The  scale  parameter 
was  chosen  large  enough  to  eliminate  the  grassy  background.  The  reconstruction  16(ii)  shows 
that  unfortunately  most  of  the  tank’s  features  are  also  gone.  Only  gross  scale  outline  features 
were  preserved.  This  is  the  price  of  clutter  removal  with  a  single  channel  analysis. 

In  contrast  the  Haar  basis  wavelet  decomposition  as  in  section  3.4  above  followed  by  the 
separation  of  a  positive  and  negative  parts  yields  18  texture/feature  channels.  The  fusion 
of  these  18  channels  is  illustrated  on  the  figure  16(iii).  Remarkable  tank  reconstruction  and 
simultaneous  background  clutter  removal  is  presented  on  the  16(iv).  Clearly,  the  multichannel 
algorithm  was  capable  of  filtering  out  the  grassy  terrain,  while  keeping  the  essential  tank 
features,  e.g  treads.  To  our  knowledge,  this  represents  the  state  of  the  art. 


Figure  15:  Original  grey  level  image  “Tank”. 
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(f) 


(«) 


Figure  16:  Non  linear  clutter  removal  via  segmentation  with  a  single  channel  and  with  mul¬ 
tichannel  based  on  wavelet  analysis. 
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